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METHOD FOR DETERMINING GENE KNOCKOUT STRATEGIES 



CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is based on U.S. Patent Application Serial No, 60/395,763, filed 
5 My 10, 2002; U.S. Patent Application Serial No. 60/417,511, filed October, 9, 2002, and 
U.S. Patent Application Serial No. 60/444,933, filed February 3, 2003, each of which is 
herein incorporated by reference in its entirety, 

GRANT REFERENCE 
10 This work has been supported by Department of Energy pursuant to Grant No. 

5B855 and the National Science Foundation Grant No. BES0120277: Accordingly, fee 
U.S. government may have certain rights in fee invention. 

BACKGROUND OP THE INVENTION 

1 5 The systematic development of engineered microbial strains for optimizing the 

production of chemicals or biachemicals is an ovei^ching challenge in biotechnology 
(Stephanopoulos et at , 1998). However, in fee absence of metabolic and genetic 
engineering interventions, fee product yields of many rnicroorgaaisms are often far below 
their theoretical maximums. This is expected because cellular metabolism is primed, 

20 through natural selection, for fee maximum responsiveness to fee history of selective 
pressures rather than for fee overproduction of specific chemical compounds. Not 
surprisingly, the behavior of metabolic networks is governed by internal cellular objectives 
which are often in direct competition wife chemical overproduction targets. 

The recent explosion of annotated sequence information along with a wealth of 

25 chemical literature has enabled the reconstruction of genome-scale metabolic networks for 
many microorganisms (Edwards and Palsson, 2000; Schilling and Palsson, 2000; Schilling 
et a!., 2002; Forster et ah, 2003). This information, used in fee context of the flux balance 
analysis (FBA) modeling framework (Varma and Palsson, 1993), has been employed 
extensively to explore the integrated fiinctions of metabolic networks (Burgard and 

30 Maranas, 2001 ; Burgard et at, 2001; Papin et al„ 2003; Price et at, 2003). FBA models 
typically invoke the optimization of a particular cellular objective (e.g., ATP production 
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(Majewski and Domaeh, 1990; Ramakrishoaet a!., 2001), biomass formation (Varma and 
Palsson, 1993,1994), raiimmzation of metabolic adjustment (Segre et al., 2002)), subject to 
network stoicbiometry, to suggest a likely flux distribution. Stoichiometric models of 
Escherichia colt {£ coli) metabolism utilizing the biomass maximization hypothesis have 
5 been in some cases successful at (i) predicting the lethality of gene knockouts (Edwards 
and Palsson, 2000; Badarinarayana et al., 2001), (ii) identifying the correct sequence of 
byproduct secretion under increasingly anaerobic conditions (Varma et ah, 1 993), and (iii) 
quantitatively predicting cellular growth rates under certain conditions (Edwards et al, 
2001), Interestingly, recent work suggests that even when FBA predictions under the 
10 biomass maximization assumption seem to fail, metabolic networks can be evolved, for 
certain cases, towards maximum growth (i.e., biomass yield) through adaptive evolution 
(Ibarra etal., 2002). 

The ability to investigate the metabolism of single-cellular organisms at a genomic 
scale, and thus systemic level, motivates the need for novel computational methods aimed 
15 at identifying strain mgmeering strategies. 

Thus, one object, feature, or advantage of the present invention is to provide a 
method for computationally suggesting the manner in which to achieve bioengmeering 
objectives. 

A further object, feature or advantage of the present invention is to determine 
20 candidates for gene deletion or addition through use of a model of a metabolic network. 

A still further object, feature or advantage of the present invention is to provide an 
optimized method for computationally achieving a bioengineering objective. 

Yet another object, feature or advantage of the present invention is to provide an 
optimized method for computationally achieving a bioengineering objective that is robust 
25 One or more of these and/or other objects, features and advantages of the present 

invention will become apparent after review of the following detailed description of the 
disclosed embodiments and the appended claims. 



BRIEF SUMMARY OF THE INVENTION 
30 The systematic development of engineered microbial strains for optimizing the 

production of chemical or biochemicaJs is an overarching challenge in biotechnology. The 
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advent of genome-scale models of metabolism, has laid the foundation for the development 
of computational procedures for suggesting genetic manipulations that lead to 
overproduction. The present invention describes a computational framework for 
suggesting gene deletion strategies leading to the overproduction of chemicals or 
5 biochemicals in E.coli. This is accomplished by ensuring that a drain towards growth 
resources (i.e., carbon, redox potential, and energy) is accompanied, due to stoichiometry, 
by the production of a desired production. Specifically, the computation framework 
identifies multiple gene deletion combinations that maximally couple a postulated cellular 
objective (e.g., biomass formation) with externally imposed chemical production targets. 

10 This nested structure gives rise to a bilevel optimization problems which are solved based 
on a transformation inspired by duality theory. This procedure of mis framework, by 
coupling biomass formation with chemical production, suggest a growth selection/adaption 
system for indirectly evolving overproducing mutants. 

One embodiment of the invention is directed to a method for determining 

1 5 candidates for gene deletions and additions. The method uses a model of a metabolic 

network associated with an organism. The model includes a number of metabolic reactions 
defining metabolite relationships. The method includes selecting a bioengineering 
objective for the selecfing.the organism. Next, at least one cellular objective is selected. 
An optimization problem is formed that couples the cellular objective with the 

20 bioengineering objective. Finally, the optimization problem is solved to yield at least one 
candidate. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 depicts the bilevel optimization structure of Optknock The inner problem 
25 performs the flux allocation based on the optirnization of a particular cellular objective 
(e.g., maximization of biomass yield, MOMA). The outer problem then maximizes the 
bioengineering objective (e.g., chemical production) by restricting access to key reactions 
available to the optimization of the inner problem. 

Figure 2 depicts the flux distributions of the (A) wild-type E. colt, (B) succinate 
30 mutant B, (C) succinate mutant C, and (B) lactate mutant C networks that maximize 
biomass yield under anaerobic conditions. 
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Figure 3 shows (A) succinate or (B) lactate production limits under anaerobic 
conditions for mutant A (dash/dotted line), mutant B (dotted line), mutant C (short dashes) 
and the mid-type E, coli network (solid). The production limits are obtained by separately 
maximizing and minimizing succinate or lactate production for the biomass yields 
5 available to each network. The yellow points depict (he solution identified by OptKnock 
(i.e., maximum chemical production at the maximum biomass yield). 

Figure 4 shows the aerobic flux distributions of the (A) wild-type E. coli, (B) 
mutant A, and (C) mutant B networks that maximize biomass yield Results for mutants A 
■ and B assume the reactions responsible for 1,3-propanediol production are available. 
1 0 Figure 5 shows 1 ,3-propanediol (PDO) production limits under aerobic conditions 

for mutant A (dash/dotted line), mutant B (dotted line), and the wild-type E. coli network 
(solid). The yellow points depict the solution identified by OptKnock (i.e., maximum 
chemical production at the maximum biomass yield). 

Figure 6 shows projection of the multidimensional flux space onto two dimensions. 
1 5 The pink region represents flux ranges potentially reachable by both the mutant and 
complete networks, while the blue region corresponds to flux distributions rendered 
unreachable by the gene deletions). Point A represents the maximum biomass yield 
solution. Point B is the solution assuming the muiimization of metabolic adjustment 
hypothesis for the mutant network, while point C is the solution assuming the mutant 
20 network will maximize its biomass yield. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The ability to investigate the metabolism of single-cellular organisms at a genomic 
scale, and thus systemic level, motivates the need for novel computational methods aimed 

25 at identifying strain engineering strategies. The present invention includes a computational 
framework termed OptKnock for suggesting gene deletion strategies leading to the 
overproduction of specific chemical compounds in£ coli. This is accomplished by 
erisuring that the production of the desired chemical becomes an obligatory byproduct of 
growth by "shaping" the connectivity of the metabolic network, in other words, OptKnock 

30 identifies and subsequently removes metabolic reactions that are capable of uncoupling 
cellular growth from chemical production. The computational procedure is designed to 
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identify not j ust straightforward but also non-intuitive knockout strategies by 
simultaneously considering the entire E. coli metabolic network as abstracted in the in 
silico E. coli mode! of Paisson and coworkers (Edwards and Palsson, 2000). The 
complexity and built-in redundancy of this network (e.g., the E. coli model encompasses 
5 720 reactions) necessitates a systematic and efficient search approach to combat the 
combinatorial explosion of candidate gene knockout strategies. 

The nested elimination framework shown in Figure 1 is developed to identify 
multiple gene deletion combinations that maximally couple cellular growth objectives with 
externally imposed chemical production targets. This multi-layered optimization structure 

1 0 involving two competing optimal strategists (i.e., cellular objective and chemical 
production) is referred to as a bilevel optimization problem (Bard, 1998). Problem 
formulation specifics along with an elegant solution procedure drawing upon linear 
progtarnrning (UP) duality theory are described in the Methods section. The OptKnock 
procedure is applied to succinate, lactate, and 1,3-propanediol (PDO) production in E. coli 

1 5 with the maximization of the biomass yield for a fixed amount of uptakes glucose 

employed as the cellular objective. The obtained results are also contrasted against using 
the rmmmization of metabolic adjustment (MOMA) (Segre et ai, 2002) as the cellular 
objective. Based on the OptKnock framework, it is possible identify the most promising 
gene knockout strategies and their corresponding allowable envelopes of chemical versus 

20 biomass production in the context of succinate, lactate, and PDO production in B, colt. 

A preferred embodiment of this invention describes a computational framework, 
termed OptKnock, for suggesting gene deletions strategies that could lead to chemical 
production in E. coli by ensuring that the drain towards metabolites/compounds necessary 
for growth resources (i.e., carbons, redox potential and energy) must be accompanied, dtie 

25 to stoichiometry, by the production of the desired chemical. Therefore, the production of 
the desired product becomes an obligatory byproduct of cefiular growth. Specifically, 
OptKnock pinpoints which reactions to remove from a metabolic network, which can be 
realized by deleting the gene(s) associated with the identified functionality. The procedure 
was demonstrated based on succinate, lactate, and PDO production fit E. coli K42. The 

30 obtained results exhibit good agreement with strains published in the literature. While 
some of the suggested gene deletions are quite straightforward, as they essentially prune 
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reaction pathways competing with the desired one, many others are at first quite non- 
intuitive reflecting the complexity and built-in redundancy of the metabolic network of £, 
coli. For the succinate case, OptKnock correctly suggested anaerobic fermentation and the 
removal of the phosphotranferase glucose uptake mechanism as a consequence of (he 
5 competition between the cellular and chemical production objectives, and not as a direct 
input to the problem. In the lactate study, the glucokinase-based glucose uptake 
mechanism was shown to decouple lactate and biomass production for certain knockout 
strategies, For the PDO case, results show that fee Entner-Doudoroff pathway is more 
advantageous than BMP glycolysis despite the fact that it is substantially less energetically 

10 efficient m addition, the so far popular tpi knockout was clearly shown to reduce the 
maximum yields of PDO while a complex network of 1 5 reactions was shown to be 
theoretically possible of "leaking" flux from the PPP pathway to the TCA cycle and thus 
decoupling PDO production from biomass formation. The obtained results also appeared 
to be quite robust with respect to the choice for the cellular objective. 

15 The present invention contemplates any number of cellular objectives, including but 

not limited to maximizing a growth rate, maximizing ATP production, minimizing 
metabolic adjustment, minimizing nutrient uptake, minimizing redox production, 
minimizing a Euclidean norm, and combinations of these and other cellular objectives. 

It is important to note that the suggested gene deletion strategies must be interpreted 

20 carefully. For example, in many cases the deletion of a gene in one branch of a branched 
pathway is equivalent with the significant up-regulation in the other. In addition, 
inspection of the flux changes before and after the gene deletions provides insight as to 
which genes need to he up or down-regulated. Lastly, the problem of mapping the set of 
identified reactions targeted for removal to its corresponding gene counterpart is not always 

25 uniquely specified. Therefore, careful identification of the most economical gene set 
accounting for isozymes and multifunctional enzymes needs to be made. 

Preferably, in the OptKnock framework, the substrate uptake flux (i.e., glucose) is 
assumed to be 10 mmol/gDW-hr. Therefore, all reported chemical production and biomass 
formation values are based upon this postulated and not predicted uptake scenario. Thus, it 

30 is quite possible that the suggested deletion mutants may involve substantially lower uptake 
efficiencies. However, because OptKnock essentially suggests mutants with coupled 
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growth and chemical production, one could envision a growth selection, system that will 
successively evolve mutants with improved uptake efficiencies and thus enhanced desired 
chemical production characteristics. 

Where there is a lack of any regulatory or kinetic information within the purely 

5 stoichiometric representation of the inner optimization problem that performs flux 
allocation, OptKnock is used to identify any gene deletions as the sole mechanism for 
chemical overproduction. Clearly, the lack of any regulatory or kinetic information in the 
model is a simplification that may in some cases suggest unrealistic flux distributions. The 
mcorporation of regulatory information will not only enhance the quality of the suggested 

1 0 gene deletions by more appropriately resolving flux allocation, but also allow us to suggest 
regulatory modifications along with gene deletions as mechanisms for strain improvement. 
The use of alternate modeling approaches (e.g., cybernetic (Kompala et ah, 1984; 
Ramakrishna et al., 1996; Varner and Ramkrishna, 1999), metabolic control analysis 
(Kacser and Bums, 1973; Heinrieh and Rapoport, 1974; Hatzimanikatis et al., 1998)), if 

1 5 available, can be incorporated wthin the OptKnock framework to more accurately estimate 
the metabolic flux distributions of gene-deleted metabolic networks. Nevertheless, even 
without such regulatory or kinetic information, OptKnock provides useful suggestions for 
strain improvement and more importantly establishes a systematic framework. The present 
invention naturally contemplates future improvements in metabolic and regulatory 

20 modeling frameworks. 

Methods 

The maximization of a cellular objective quantified as an aggregate reaction flux for 
a steady state metabolic network comprising a set Jt~ {I,. N} of metabolites and a set Jl 
25 -{!,...* M\ of metabolic reactions fueled by a glucose substrate is expressed 
mathematically as follows, 

(Primal) 



subject to 
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V atp 2: %m mmol/gDWhr 



V; e A™ 



•5 



v,<0, 



Vj 6 ^ 



where S# is the stoichiometric coefficient of metabolite i in reaction j, v, represents the flux 
of reaction j, is the basis glucose uptake scenario, ^ is the non-growth 

associated ATP maintenance requirement, and is a minimum level of biomass 

10 production The vector v includes both internal and transport reactions. The forward (i.e., 
positive) direction of transport fluxes corresponds to the uptake of a particular metabolite, 
whereas the reverse (i.e., negative) direction corresponds to metabolite secretion. The 
uptake of glucose through the phosphotransferase system and glucofcsnase are denoted by 
and v^sb respectively. Transport fluxes for metabolites that can only be secreted from 

1 5 the network are members of Ji^jmiy- Note also that the complete set of reactions Jt-is 
subdivided into reversible^ and irreversible «<W reactions. The ceBular objective is 
often assumed to be a drain of biosynthetic precursors in the ratios required for biomass 
formation (Neidhardt and Curtiss, 1996). The fluxes are reported per 1 gDW-hr such that 
biomass formation is expressed as g biomass produced/gDW-hr or i/hr. 

20 The modeling of gene deletions, and thus reaction elimination, first requires the 

incorporation of binary variables into the flux balance analysis framework (Burgard and 
Maranas, 2001; Burgard et at., 2001). These binary variables, 



assume a value of one if reaction j is active and a value of zero if it is inactive. The 
25 following constraint, 
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ensures that reaction flux v, is set to zero only if variable. yy is equal to zero. Alternatively, 
whmyj is equal to one, v,- is fires to assume any value between a lower vf" K and an upper vf 10 * 
bound. In this study, vf'" and v/"* ate identified by minimising and subsequently raaxtaiizrag 
every reaction flux subject to the constraints from the Primal problem. 

The identification of optimal gene/reaction knockouts requires the solution of a 
bileve! optimization problem that chooses the set of reactions that can be accessed (yj ~ 1) 
so as the optimization of the cellular objective indirectly leads to the overproduction of the 
chemical or biochemical of interest (see also Figure 1). Using biomass formation as the 
cellular objective, this is expressed mathematically as the following bilevel mixed-integer 
optimization problem 

(OptKnock) 



(ject to 



if- V *Sa 



(Primal) ~\ 



20 yj~ {0,1}, Vy e M 

Y$~y^K 
m 

where K is the number of allowable knockouts. The final constraint ensures mat the 
resulting network meets a nummura biomass yield, • 

The direct solution of this two-stage opruxdzation problem is intractable given the 
25 high dimensionality of the flux space (te., over 700 leaetions) and the presence of two 
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nested optimization problems. To remedy this, we develop an efficient solution approach 
borrowing from LP duality theory which shows that for every linear programming problem 
(primal) there exists a unique optimization problem (dual) whose optimal objective value is 
equal to that of the prima! problem. A similar strategy was employed by (Burgard and 
5 Maranas, 2003) for identifying/testing metabolic objective functions from metabolic flux 
data. The dual problem (Igruzio and Cavalier, 1994) associated with the OptKnock inner 
problem is 

V.^™.*.^ tp + ■ Aianrnss + V ^ ■ glc (Dual) 



subject to Y, + M* + «fc - 0 

M 
N 

2 + Mj - 0. VjeJtij* gik, pts, biomass 

fif ■ 0 ~ * 0 - v ; ) , W e ^ andy * uW_«*y 

^ 2 jWj* • (1~ V,), Vy 6 Aev and^ecr^nty 

15 ffj& mT ' 0 ~ yj)> VJ s andy e ^W^y 

e Vy e ^Wv aud4to C r_«i y 

v y e 

glc e 

where ^/* ofcA is the dual variable associated with the stoichiometric constraints, glc is the 
20 dual variable associated with the glucose uptake constraint, and ptj is the dual variable 
associated with any other restrictions on its corresponding flux vj in the Primal. Note that 
fee dual variable # acquires unrestricted sign if its corresponding flux in the OptKnock 
inner problem is set to xero by enforcing^ = 0, Hie parameters ft/*"' and are identified 
10 
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by samumizing and subsequently maximizing their values subject to the coostraiots of the Dual 
problem. 

If the optimal solutions to the Primal and Dual problems are bounded, their 
objective function values must be equal to one another at optimality. This means that every 
optimal solution to both problems can be characterized by setting their objectives equal to 
one another and accumulating their respective constraints. Thus the bilevel formulation for 
OptKnock shown previously can be transformed into the following single-level MELP 

maximize Vtkemicat (OptKnock) 



subject to 



V«p > V a(p i!min mnwVgDW-hr 




v blmtu 
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5 



VJeJr 



glc e ^ 



10 An important feature of the above formulation is that if the problem is feasible, the optimal 
solution will always be found In this invention, the candidates for gene knockouts 
included, but are not limited to } all reactions of glycolysis, the TCA cycle, the pentose 
phosphate pathway, respiration, and all anaplerohc reactions. This is accomplished by 



1 5 Problems containing as many as 1 00 binary variables were solved in the order of minutes to 
hours using CPLEX 7,0 accessed via the GAMS modeling environment on an IBM 
SS6O00-27O workstation. It should be understood, however, that the present invention is 
not dependent upon any particular type of computer or environment being used. Any type 
can be used to allow for inputting and ouiputting die information associated with the 

20 methodology of the present invention. Moreover, the steps of the methods of the present 
invention can be implemented in any number of types software applications, or languages, 
and the present invention is not limited in this respect 

It will be appreciated mat other embodiments and uses will be apparent to those 
skilled in the art and that the invention is not limited to these specific illustrative examples. 



limiting the number of reactions included in the summation (i.e., 



Ed-*)-*). 



jeCettral; 
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EXAMPLE 1 
Succinate and lactate Production 
Which reactions, if any, that could he removed from the E. coli K-12 stoichiometric 
mode! (Edwards and Palsson, 2000) so as the remaining network produces succinate or 
5 lactate whenever biomass maximization is a good descriptor of flux allocation were 

identified. A prespecified amount of glucose (10 mmol/gDWhr), along with unconstrained 
uptake routes for inorganic phosphate, oxygen, sulfate, and ammonia are provided to fuel 
the metabolic network. The optimization step could opt for or against the 
phosphotransferase system, glucokinase, or both mechanisms for the uptake of glucose, 

10 Secretion routes for acetate, carbon dioxide, ethanol, formate, lactate and succinate are also 
enabled. Note mat because the glucose uptake rate is fixed, the biomass and product yields 
are essentially equivalent to the rates of biomass and product production, respectively, m 
all cases, the OptKnoek procedure eliminated the oxygen uptake reaction pointing at 
anaerobic growth conditions consistent with current succinate (Zeikus et al., 1999) and 

1 5 lactate (Datta et al., 1995) fermentative production strategies. 

Table I summarizes three of the identified gene knockout strategies for succinate 
overproduction (i.e., mutants A, B, and C). The anaerobic flux distributions at the 
maximum biomass yields for the complete E. coli network (i.e., wild-type), mutant B, and 
mutant C are illustrated in Figure 2A-C. The results for mutant A suggested that the 

20 removal of two reactions (i.e., pyruvate formate lyase and lactate dehydrogenase) from the 
network results in succinate production reaching 63% of its theoretical maximum at the 
maximum biomass yield. This knockout strategy is identical to the one employed by Stols 
and Donnelly (Stols and Donnelly, 1997) in their succinate overproducing E. coli strain. 
Next, the envelope of allowable succinate versus biomass production was explored for the 

25 wild-type E. coli network and the three mutants listed in Table i Note that the succinate 
production limits, shown in Figure 3 A, revealed that mutant A does not exhibit coupled 
succinate and biomass formation until die yield of biomass approaches 80% of the 
maximum. Mutant B, however, with the additional deletion of acetaldehyde 
dehydrogenase, resulted in a much earlier coupling of succinate with biomass yields. 

30 A less intuitive strategy was identified for mutant C which focused on inactivating 

two PEP consuming reactions rather man e&anrahhg competing byproduct (i.e., ethanol, 
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formate, and lactate) production mechanisms. First, fee phosphotransferase system was 
disabled requiring fee network to rely exclusively on glucokinase for the uptake of glucose. 
Next, pyruvate kinase was removed leaving PEP cafeoxyfcraase as the only central 
metabolic reaction capable of draining the significant amount of PEP supplied by 
5 glycolysis. This strategy, assuming feat the maximum biomass yield could be attained, 
resulted in a succinate yield approaching 88% of the theoretical maximum. In addition, 
Figure 3A revealed significant succinate production for every attainable biomass yield, 
while the maximum theoretical yield of succinate is fee same as that for the wild-type 
strain. 

10 The OptKnock framework was next applied to identify knockout strategies for 

coupling lactate and biomass production. Table I shows three of the identified gene 
knockout strategies (/, e, , mutants A, B, and C) and the flux distribution of mutant C at the 
maximum biomass yield is shown in Figure 2D. Mutant A redirects flux toward lactate at 
the maximum biomass yield by blocking acetate and ethanol production. This result is 

15 consistent with previous work demonstrating that an adh, pta mutant E. coli strain could 
grow anaerobicaliy on glucose by producing lactate (Gupta and Clark, 19S9). Mutant B 
provides an alternate strategy involving the removal of an initial glycolysis reaction along 
wife fee acetate production mechanism. This results in a lactate yield of 90% of its 
theoretical limit at the maximum biomass yield. The vertical red line for mutant B in 

20 Figure 3B indicates that the network could avoid producing lactate while maximizing 
biomass formation. This is due to the fact mat OptKnock does not explicitly account for 
the **worst-case" alternate solution. It should be appreciated that upon the additional 
elirnination of fee glucokinase and ethanol production mechanisms, mutant C exhibited a 
tighter coupling between lactate and biomass production. 
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Table I ~ Biomass and chemical yields for various geae knockout strategies identified by 
OptKaock, The reactions and corresponding enzymes for each knockout strategy are listed. 
The maximum biomass and corresponding chemical yields are provided on a basis of 10 
mmol/hr glucose fed and 1 gD W of cells. The rightmost column provides the chemical 
5 yields for the same basis assuming a nunimai redistribution of metabolic fluxes from the 
wild-type (undeleted) E. coli network (MOMA assumption). For the 1 ^-propanediol case, 
glycerol secretion was disabled for both knockout strategies. 

Succinate max v ilemlss , mto IfSSfr,,- vj j 

Bioioass Succinate Succinate 

tt> KacckoHls Enzyme (Mir) (mmol/hr) (mmoMir) 



Pyruvate formate lysst 



CQA*PYR~>ACCOA + FOR Pynsvste&nnstelyaae 
NABH+PVK4+LAC+NAD Uctote d t f,«{mge«w 

ACCO A + 2 NADH ** COA + EFH + 2 NAD i 

A15P+PSP-+ATF + PYR 
ACTP*A»P**AC*ATP or 
ACCOA + Pi«ACTP + COA 
GLC + P!g ^C6F-t-PYK 



ACTP + ADF** AC + ATP or 
ACXOA + Pi *♦ AC1T + CO A 
ACD0A+3NAJ3H *• COA+BTH+3NAD 

ACTPt ADP**AC*ATP or 
ACCOA+ Pi *» ACTP + COA 
ATP+r«P-*A!>P + fDP or 
?DP«+BPl+T3S3 



H>p**T3Pi + T3P2 

ACCOA+ 1 NADH «-> COA+E7H + 1 NAD 
flIjC4 ATP- 




0,28 10.46 



0.33 tS.00 



0.J2 18.13 



"CoBspfeJesetofork'' 

F»P-»?SP*Pi or 
KSP^TSPl + TSPa 
13PDG + ABP«3PG + AfP or 
NAD + K + titi ** J3PDG + NADH 
OL+ NAO +* GLAL + NADS 

T3P1 ♦♦T3P2 

GAP + NADP ** DSPGL + MADPH or 

DfiPGL^BfifOC 

DR5?"»ACAt, + «Pi 

q,+ SAP*»OtAI. + NAEB 
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EXAMPLE 2 
1 ,3-Pxopaaedioi (PDO) Production 
In addition to devising optimum gene knockout strategies, OptKnock was used to 
design strains where gene additions ware needed along with gene deletions such as in PDO 

5 production in & coli Although microbial 1,3-piopmediol (PDO) production methods 
haw been developed utilizing glycerol as the primary carbon source (Hartlep ei ai 5 2002; 
Zhu et ai„ 2002), the production of i>propanediol directly from glucose in. a single 
microorganism has recently attracted considerable interest (Cameron et al., 1998; Biebl et 
al, 1999; Zeng and Biebl s 2002). Because wild-type B. coli lacks the pathway necessary 

1 0 for PDO production, the gene addition framework was first employed (Burgard and 
Maranas, 2001) to identify the additional reactions needed for producing PDO from 
glucose in B, coli. The gene addition framework identified a straightforward three-reaction 
pathway involving the conversion of glycerol-3-P to glycerol by glycerol phosphatase, 
followed by the conversion of glycerol to 1,3 propanediol by glycerol dehydratase and 1,3- 

1 5 propanediol oxidoreductase. These reactions were then added to the E. coli stoichiometric 
model and the OptKnock procedure was subsequently applied. 

OptKnock revealed that there was neither a single nor a double deletion mutant 
with coupled PDO and biomass production. However, one triple and multiple quadruple 
knockout strategies that can couple PDO production with biomass production was 

20 identified. Two of these knockout strategies are shown in Table I The results suggested 
that the removal of certain key functionalities from the £ coli network resulted in PDO 
overproducing mutants for growth on glucose. Specifically, Table I reveals that the 
removal of two glycolytic reactions along with an additional knockout preventing the 
degradation of glycerol yields a network capable of reaching 72% of the theoretical 

25 maximum yield of PDO at the maximum biomass yield. Note that the glyceraldehyde-3- 
phosphate dehydrogenase (gap A) knockout was used by DuPont in their PDO- 
overproducing£. coli strain (Nakamura, 2002), Mutant B revealed an alternative strategy, 
involving me removal of the triose phosphate isomerase (tpi) enzyme exhibiting a similar 
PDO yield and a 38% higher biomass yield. Interestingly, a yeast strain deficient hi triose 

30 phosphate isomerase activity was recently reported to produce glycerol, a key precursor to 
PDO, at 80-90% of its maximum theoretical yield (Compagno et al., 1996). 



16 




WO 2004/018621 PCT/US2Mi3tt21S98 

The flux distributions of the wild-type E. coli, mutant A, and mutant B networks 
thatmaxintize the biomass yield are available in Figure 4. Not surprisingly, further 
conversion of glycerol to glyceraldehyde was disrupted in both mutants A and B, For 
mutant A, the removal of two reactions from fee top and bottom parts of glycolysis resulted 
5 in a nearly complete inaetivation of the pentose phosphate and glycolysis (wife the 

exception of triose phosphate isomerase) pathways. To compensate, the Entaer-Doudoroff 
glycolysis pathway is activated to channel flux from glucose to pyruvate and 
glyceraldehyde-3-phosphate (GAP). GAP is then converted to glycerol which is 
subsequently converted to PDO. Energetic demands lost with the decrease in glycolytic 

10 fluxes from the wild-type S. coli network case, are now met by an increase in die TCA 

cycle fluxes. The knockouts suggested for mutant B redirect flux toward the production of 
PDO by a distinctly different mechanism. The removal of the initial pentose phosphate 
pathway reaction results in the complete flow of metabolic flux through the first steps of 
glycolysis. At the fructose bisphosphate aldolase junction, the flow is split into the two 

15 product metabolites: dihydroxyacetone-phosphate (DHAP) which is converted to PDO and 
GAP which continues through the second half of the glycolysis. The removal of the trioso- 
phosphate isomerase reaction prevents any interconversion between DHAP and GAP. 
Interestingly, a fourth knockout is predicted to retain the coupling between biomass 
formation and chemical production. This knockout prevents the 'leaking'' of flux through 

20 a complex pathway involving 1 5 reactions that together convert ribose-5-phosphate (R5P) 
to acetate and GAP, thereby decoupling growth from chemical production. 

Next, the envelope of allowable PDO production versus biomass yield is explored 
for the two mutants listed in Table I. The production limits of the mutants along with the 
original £. coli network, illustrated in Figure 5, reveal that the wild-type & coli network 

25 has no "incentive" to produce PDO if the biomass yield is to be maximized. On the other 
hand, both mutants A and B have to produce significant amounts of PDO if any amount of 
biomass is to be formed given the reduced functionalities of the network following Ae gene 
removals. Mutant A, by avoiding the tpi knockout that essentially sets the ratio of biomass 
to PDO production, is characterized by a higher maximum theoretical yield of PDO. The 

30 above described results hinge on the use of glycerol as a key intermediate to PDO. Next, 
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the possibility of utilizing an alternative to the glycerol conversion route for 1,3- 
propandediol production was explored. 

Applicants identified a path-way in Chloroflexus aurantiacus involving a two-step 
NADPH-dependant reduction of malonyl-CoA to generate 3~hydroxvpropionic acid (3- 
5 HPA) (Menendez et al., 1 999; Hugler et al., 2002). 3-HPA eonld then be subsequently 
converted chemically to 1,3 propanediol given that there is no biological functionality to 
achieve this transformation. This pathway offers a key advantage over PDO production 
through the glycerol route because its initial step (acetyl-CoA carboxylase) is a carbon 
fixing reaction. Accordingly, the maximum theoretical yield of 3-HPA (1.79 mmol/mmol 

1 0 glucose) is considerably higher than for PDO production through the glycerol conversion 
route (1.34 mmol/mmol glucose). The application of the OptKnock framework upon the 
addition of the 3-HPA production pathway revealed that many more knockouts are required 
before biomass formation is coupled with 3-HPA production. One of the most interesting 
strategies involves nine knockouts yielding 3-HPA production at 91% of its theoretical 

15 ■ maximum at optimal growth. The first three knockouts were relatively straightforward as 
they involved removal of competing acetate, lactate, and ethanol production mechanisms. 
In addition, the Entaer-Doudoroff pathway (either phosphogluconate dehydratase or 2- 
keto-3-deoxy-6-phosphogluconate aldolase), four respiration reactions (i.e., NADH 
dehydrogenase I, NADH dehydrogenase % glycerol-3-phosphate dehydrogenase, and the 

20 succinate dehydrogenase complex), and an initial glycofyis step (i.e., phosphoglucose. 
isomerase) are disrupted. This strategy resulted in a 3-HPA yield that, assuming the 
maximum biomass yield, is 69% higher than the previously identified mutants utilizing the 
glycerol conversion route. 



All results described previously were obtained by invoking the maximization of 
biomass yield as the cellular objective that drives flux allocation. This hypothesis 
essentially assumes that the metabolic network could arbitrarily change and/or even rewire 
regulatory loops to maintain biomass yield maximality under changing environmental 
30 conditions (maxima! response). Recent evidence suggests that this is sometimes achieved 
by the K-12 strain of £ coli after multiple cycles of growth selection (Ibarra et al., 2002). 



EXAMPLES 



25 



Alternative Cellular Objective: Mrumization of Metabolic Adjustment 
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In this section, a contrasting hypothesis was examined {i.e., minimization of metabolic 
adjustment (MOMA) (Segre et al., 2002)) that assumed a myopic (minimal) response by 
the metabolic network upon gene deletions. Specifically, the MOMA hypothesis suggests 
that the metabolic network will attempt to remain as close as possible to the original steady 
5 ' state of the system rendered unreachable by the gene deletions). This hypothesis has been 
shown to provide a more accurate description of flux allocation immediately after a gene 
deletion event (Segre et al., 2002). Figure 6" pictoriaUy shows the two differing new steady 
states predicted by the two hypotheses, respectively. For this study, the MOMA objective 
was utilized to predict the flux distributions in the mutant strains identified by OptKnock, 

1 0 The base case for the lactate and succinate simulations was assumed to be maximum 
biomass formation under anaerobic conditions, while the base case for the PDO 
simulations was maximum biomass formation under aerobic conditions. The results are 
shown in the last column of Table 1 . In all cases, the suggested multiple gene knock-out 
strategy suggests only slightly lower chemical production yields for the MOMA case 

15 compared to the maximum biomass hypothesis. This implies that the OptKnock results are 
fairly robust with respect to the choice of cellular objective. 

The publications and other material used herein to illuminate the background of the 
invention or provide additional details respecting the practice, are herein incorporated by 
reference in their entirety. 

20 The present invention contemplates numerous variations, including variations in 

organisms, variations in cellular objectives, variations in bioengineering objectives, 
variations in types of optimization problems formed and solutions used. 

These and/or other variations, modifications or alterations may be made therein 
without departing from the spirit and the scope of the invention as set forth in the appended 

25 claims. 
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What k claimed is: 

1 . A method for deJermimng candidates for gene deletions and additions using a 
model of a metabolic network associated with an organism, the model comprising a 
plurality of metabolic reactions defining metabolite relationships, the method comprising: 
5 selecting a bioengmeering objective for the organism; selecting at least one ceEute 

objective; forming an optimization problem that couples the at least one cellular objective 
with the bioengmeering objective; and solving the optimization problem to yield at least 
one candidate. 

10 2, The method of claim I further comprising modifying the organism with the 
candidate. 

3. The method of claim 1 wherein the bioengmeering obj ective is overproduction of a 
chemical. 

15 

4. The method of claim 1 wherein the bioengineering objective is underproduction of 
a chemical 

5. The method of claim 1 wherein the cellular objective is growth. 



6. The method of claim 1 wherein the cellular objective is minimization of metabolic 
adjustment. 



25 the optimization problem includes a binary value for specifying if reaction flux is active or 
inactive. 
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The method of claim 1 wherein the candidate is a candidate for gene deletion, and 



8- The method of claim 1 wherein the optimization problem is a bilevel optimization 
problem, 
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9. 



The method of claim I wherein the optiinizatioa problem is a mixed-integer 




1 0. The method of claim 1 wherein the optimization problem includes at least one 
5 stoichiometric constraint 

1 1 . The method of claim 1 wherein the optimization problem includes at least one 
chemical uptake constraint. 

10 12, The method of claim 1 wherein the step of forming an optimization problem 
includes quantifying the cellular objective as an aggregate reaction flux. 

1 3 . The method of claim 1 further comprising evaluating performance limits of the 
metabolic network with the at least one candidate based on ability of the network to meet 

15 the at least one cellular objective. 

1 4. The method of claim 1 wherein the cellular objective is selected from the group 
consisting of: maximizing a growth rate, maximizing ATP production, minimizing 
metabolic adjustment, minimizing nutrient uptake, minimizing redox production, 

20 minimizing a Euclidean norm, and combinations thereof. 

1 5. The method of claim 1 wherein the bioengineering objective is overproduction of 
glycerol and at least one candidate is for gene deletion and comprising genes coding for the 
enzymes fructose- 1 ,6-bisphosphatase, fructose- 1 ,6-bisphosphatase aldolase, 

25 phosphoglyeerate kinase, glyceraldehydes-3-phosphate dehydrogenase, 

phosphoenolpyruvate synthase, NADH dehydrogenase I, phosphogluconate dehydratase, 2- 
keto-3~deoxy-6~phosphoSuconate aldolase, triosphosphate isomerase, glucose 6-phosphate- 
1 -dehydrogenase, 6-phosphogluconolactonase, deoxyribose-phosphate aldolase, aldehyde 
dehydrogenase, or combmations thereof. 
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16, The method of claim 1 wherein the bioengmeering objective is overproductioii of 
1,3-propaaediol and at least one candidate is for gene deletion and comprising genes 
coding for the enzymes fructose-l > 6-Msphosphatase, fructose- 1 ,6-bisphosphatase aldolase, 
phosphoglycerate kinase, glyceraldehyde-3-phosphate dehyrogenase, triosphosphate 

5 isomerase, glucose 6-phosphate- 1 -dehydrogenase, 6-phosphogluconolactonase, 
deoxyribose-phospbate aldolase, aldehyde dehydrogenase, or combinations thereof. 

17. The method of claim 1 wherein the bioengineering objective is overproduction of 
succinate and at least one candidate is for gene deletion and comprising genes coding for 

10 the enzymes pyruvate formate lyase, acetaldehyde dehydrogenase, pyruvate kinase, FOF1- 
ATPase, NADH dehydrogenase I, fumarase, D-Lactate dehydrogenase, pyridine nucleotide 
transhydrogenase, phosphotransacetylase, acetate kinase, phosphotransferase, or 
combinations thereof. 

15 18. The method of claim 1 wherein the bioengineering objective is overproduction of 
lactate and at least one candidate is for gene deletion and comprising genes coding for the 
enzymes phosphotransacetylase, acetate kinase, phosphofructokinase, fructose- 1,6- 
bisphosphatase aldolase, triosphosphate isomerase, acetaldehyde dehyrogenase, 
glucokmase, or combinations thereof. 

20 

19. A computer-based method for determining candidates for gene deletions and 
additions using a model of a metabolic network associated with an organism, the model 
comprising a plurality of metabolic reactions denning metabolite relationships, the method 
comprising: inputting at least one bioengineering objective; receiving as input as least one 
25 cellular objective; forming an optimization that quantifies the cellular objective as an 
aggregate reaction flux and couples the at least one cellular objective with the 
bioengineering objective; solving the optimization problem to yield at least one candidate; 
and outputhng the at least one candidate. 
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